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Measures of information transfer have become a popular approach to analyze interactions in complex sys¬ 
tems such as the Earth or the human brain from measured time series. Recent work has focused on causal 
definitions of information transfer aimed at decompositions of predictive information about a target variable, 
while excluding effects of common drivers and indirect influences. While common drivers clearly constitute 
a spurious causality, the aim of the present article is to develop measures quantifying different notions of the 
strength of information transfer along indirect causal paths, based on first reconstructing the multivariate causal 
network [Tigramite approach). Another class of novel measures quantifies to what extent different intermediate 
processes on causal paths contribute to an interaction mechanism to determine pathways of causal informa¬ 
tion transfer. The proposed framework complements predictive decomposition schemes by focusing more on 
the interaction mechanism between multiple processes. A rigorous mathematical framework allows for a clear 
information-theoretic interpretation that can also be related to the underlying dynamics as proven for certain 
classes of processes. Generally, however, estimates of information transfer remain hard to interpret for nonlin- 
early intertwined complex systems. But, if experiments or mathematical models are not available, measuring 
pathways of information transfer within the causal dependency structure allows at least for an abstraction of the 
dynamics. The measures are illustrated on a climatological example to disentangle pathways of atmospheric 
flow over Europe. 


I. INTRODUCTION 


The availability of vast amounts of time series data from 
such complex systems as the Earth or the human brain and 
body has given rise to a plethora of time series analysis meth¬ 
ods aimed at understanding interactions between regions or 
subprocesses in these complex systems. Of a particular in¬ 
terest are methods to quantify some notion of information 
flow or information transfer within the complex system. In 
neuroscience Ill and climate research Glia, such interpreta¬ 
tions have often been based on pure pairwise correlation anal¬ 
yses. But towards measuring information transfer, the method 
should, firstly, be general enough to include also nonlinear as¬ 
sociations. This can be achieved in an information-theoretic 
framework with measures such as mutual information (MI) 
0. Secondly, networks reconstructed from pairwise mea¬ 
sures of association (be it cross-correlation or MI) do not al¬ 
low to assess the propagation of information or hypothetical 
perturbations in a causal sense: For example, an interaction 
like X ^ Z ^ Y would imply that X and Y are correlated 
even though no perturbations originating in X can actually 
reach Y, or vice versa. 

An important step towards deeper insights has, therefore, 
been achieved by methods that are capable of inferring a 
statistical notion of directionality or even causal interactions 
which have been applied to the climate system ifS UTOll . the 
human brain EHia, and to disentangle cardiovascular pro¬ 
cesses EHm, among others. Causal associations between 
subprocesses can be visualized as links in a complex interac¬ 
tion network. A full causal reconstruction of a link X ^ Y 
can only be achieved under the in most cases unrealistic as¬ 
sumption that all possible other influences on X and Y can be 
included in the analysis ifTTlfTSll . or if the system can be ex¬ 
perimentally manipulated within Pearl’s causal effect frame¬ 
work m. Usually, it is impossible to exclude all other influ¬ 


ences and large complex systems can typically not be easily 
experimentally manipulated. Causal inference based on data- 
analysis methods, therefore, provides only a first step and the 
term “causal” can then only be understood to be meant relative 
to the system under study, i.e., the processes that comprise the 
nodes of the network. 

Two tasks need to be addressed to measure a causal notion 
of information transfer from time series of complex systems; 

1. Reconstructing the causal network, 


2. Quantifying causal information transfer. 


In this article we will focus on the quantification part, the 
reconstruction problem has been addressed by the author in 
Ref II 20 I . As further reviewed below, previous works have 
mainly considered a decomposition of the predictive informa¬ 
tion in direct drivers of a process Y. In the present article, 
we ask a different question: How does information originat¬ 
ing in a process X propagate also on indirect paths through 
the causal interaction network? How strong is it and which 
intermediate processes on causal pathways are contributing to 
such a mechanism? 

The paper is organized as follows: In the remainder of this 
introductory section, we review recent approaches to measur¬ 
ing information transfer in complex systems and sketch the 
basic idea underlying the present approach. In Sect. [11] we 
recall basic concepts of information theory and in Sect, [^in¬ 
troduce the concept of time series graphs as the causal basis 
of the present approach. In Sect. [^ we introduce the novel 
measures based on time series graphs to quantify interactions 
along paths and mediation and distinguish them from trans¬ 
fer entropy-related approaches. In Sect. [V| we extensively an¬ 
alyze the measures with analytical and numerical examples 
and provide theorems that foster a more rigorous mathemati¬ 
cal and dynamical understanding to facilitate the interpretabil- 


ity of the proposed measures. Section VI discusses the theo 
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retical results and relation to linear measures of causal effect 
in Pearl’s framework ED, and gives an outlook to applica¬ 
tions of the novel measures in complex network theory. Fi¬ 
nally, Sect. |VII| gives an illustrative application to climato¬ 
logical time series and Sect. VIII concludes the paper. The 
appendix contains proofs of the theorems. 


A. Quantifying causal information transfer 

Compared to the hrst task of detecting causal interactions, 
more or less a binary question, the second task of quantifying 
causal information transfer is much more ambiguous to dehne 
in a universal way which has led Smirnov Il22ll23l to question 
the goal of assessing a “causal coupling strength” and instead 
measure “how the coupling manifests itself in the dynamics” 
in an interventional-effect causal framework as proposed by 
Pearl QSI. In Ref. Il24l the term ‘information transfer’ is 
even distinguished from ‘information flow’ where the latter 
is meant in a causal sense based on interventions. This frame¬ 
work, however, necessitates either to experimentally manipu¬ 
late the system, or to have a mathematical model to perform 
“virtual interventions”. To some extent causal effects can also 
be extracted if the time series cover the whole state space or 
attractor of the complex system m such that virtual inter¬ 
ventions can be drawn ‘randomly’ from the stationary distri¬ 
bution. In a mathematical model the strength of a coupling 
mechanism can often be related to model coefficients and a 
plethora of methods exists that implement the model-based 
concept of Granger causality HD- These range from classical 
linear autoregressive models in the form of the directed trans¬ 
fer function ESMl, to slightly less restrictive approaches 
such as partial directed coherence using spectral estimators 
||28143^ . extended Granger causality with local linear embed¬ 
dings in phase space ES, or kernel estimators ll34ll . to name 
just a few. All these approaches still involve strong assump¬ 
tions about the dependencies and share the problem that the 
model might be misspecified. This implies that the model 
may not adequately represent important interactions such as 
the complicated interplay between El Nino Southern Oscil¬ 
lation and the Indian Monsoon in the climate system llTSl or 
neural interactions where even a fully physical model is lack¬ 
ing. 

If it is not possible to measure “how the coupling manifests 
itself in the dynamics”, information-theoretic quantihers can 
at least help to measure “how the causal coupling manifests 
itself in the exchange of entropy between the subprocesses” 
in an information-theoretic framework capturing almost any 
form of statistical association. Here ‘causal’ is meant rel¬ 
ative to the observed process as discussed above. This ap¬ 
proach aims to distinguish different contributions based on the 
Markovian conditional independence structure of the multi¬ 
variate process as an abstraction of the dynamics. 

There are few works considering multivariate definitions 
of information transfer and their interpretation. In Ref. EH, 
the central concept is to decompose the predictive informa¬ 
tion about the next time step of a subprocess Y into the 
MI between Y and its own past as the information stor¬ 


age, the partial transfer entropy from another subprocess 
X, and the TE between Y and the remaining process. In 
Refs. ED [Ml another decomposition is proposed to detect 
redundant and synergetic contributions of driving variables. 
Liang 09) l40ll presents a rigorous approach based on the un¬ 
derlying Langevin description of a system to define the contri¬ 
butions of internal and external driving to the evolution of the 
entropy of a subprocess Y. This approach is, however, based 
on the knowledge of the deterministic-stochastic equations of 
the system, but in principle it can also be estimated from time 
series alone involving numerical optimization problems. In 
Refs. II 4 TII 42 I an idea is described that is similar to the present 
approach in that there the question of quantifying the strength 
of links is seen as a second step based on the known causal 
network. Ay et al. HD address the problem from an inter- 
ventionalist perspective using Pearl’s do-calculus lIT^ which 
we do not further discuss here since we assume the process 
to be not manipulable. Janzing et al. i42i dehne the strength 
of a link X ^ Y hy considering the thought experiment of 
an attacker ‘cutting the link’ and feeding in the distribution of 
X as an input, arriving at a measure that is not a conditional 
mutual information anymore, which we use here to measure 
the transfer of information. Also, the authors state that it is 
difficult to quantify also indirect effects in their framework. 
In general, there are different ways to dehne measures and 
different research questions demand different properties. 


B. The idea of momentary information 


The approach to measures of causal information transfer 


formally introduced in Sect. IV is based on the fundamen¬ 
tal concept of source entropy, also termed the entropy rate 
ESllMl, and was introduced for the special case of bivariate 
ordinal pattern time series in Ref. ES. Consider a symbol¬ 
generating process X. At each time t a realization xt is gen¬ 
erated. Now the source entropy of Xt measures the uncer¬ 
tainty about Xt before its observation if all former observa¬ 
tions {xt-i, Xt- 2 ,...) are known (entropies will be formally 
introduced in Sect. B- Eor a completely deterministic non- 
chaotic system the source entropy will always be zero, but 
for a real world process there will always be some uncer¬ 
tainty stemming from dynamical noise. This type of noise 
is to be distinguished from observational noise which usually 
contaminates each measured time series Phil , but has no effect 
on the dynamics of the process. Dynamical noise might occur 
due to unresolved smaller-scale processes and can be modeled 
by including a random variable in the system. More formally, 
consider a subprocess AT of a multivariate process X with in- 
hnite past Xj" = (Xt_i, 'Xt- 2 , ■ ■ ■), that is described by the 
discrete-time equation 


Xt=f{Z^^t-rt,Z2,t-r,,...,rif), ( 1 ) 

with some arbitrary function / of other subprocesses at 
past times Zi^t-ri, Z 2 ,t-T 2 , S Xj" and the random 
part subsumed under rjf. The uncertainty of an outcome 
Xt will on average be reduced if a realization of the past 
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Figure 1. (Color online) Consider a realization of dynamical noise 
77 ^ driving subprocess X as a perturbation. Coupling mechanisms 
along different causal paths (black lines) transform such a perturba¬ 
tion, and the total effect on V some time later can also depend on how 
intermediate processes nonlinearly interact with each other as shown 
in Sect. |V B| The central idea of the momentary information trans¬ 
fer measures presented in this article is to information-theoretically 
quantify the general effect of such perturbations and isolate it from 
common drivers in the past such as Z 2 , but also Zi and the past 
of X. To also quantify how much intermediate processes such as 
(Wi, W 2 ) on causal paths mediate information, it will also be im¬ 
portant to exclude common drivers like X 3 . 


Zi t-m Z 2 ^t-T 2 ^ ■ • ■ is known. But for non-zero rj^ there 
will always be some “surprise” left when observing Xt- This 
surprise gives us information and the expected information 
here is the source entropy 7J(Xt|X^)of X. If the dynam¬ 
ical noise rj^ occurs additively in Eq. (m, then iT(Xj |X^) = 
Due to measurement errors or observational noise 
e, we will in general not be able to estimate the source en¬ 
tropy alone, but only H{Xt+€^\X.^ )■ Even assuming 

a perfect measurement apparatus for a deterministic dynami¬ 
cal system without dynamical noise, the entropy rate 
- since it is computed by creating a symbol sequence from a 
coarse graining in phase-space - depends on some resolution 
parameter r. Then the limit limr_j.o might exist and is 

called the Kolmogorov-Sinai entropy. If this limit is finite and 
larger than zero, the system is called chaotic. But here we 
study stochastic, discrete time processes because the finite set 
of measured variables of a complex system like the Earth will 
never perfectly describe the full system’s state and all remain¬ 
ing processes contribute to dynamical noise (implying that the 
Kolmogorov-Sinai entropy diverges). 

While the focus in Refs. ll^lJTl and related works is on de¬ 
compositions of predictive information on the basis of transfer 
entropy as an information-theoretic generalization of Granger 
causality, the concept here is more similar to Sims causality, 
see, e.g., El, which takes into account not only direct, but 
also indirect causal effects. Sims causality is based on mea¬ 
suring to what extent X at time t helps in predicting Y at 
times t' > tin the future excluding the past of X and also the 
present of all other processes, i.e., = (Xj, Xt_i, ...). 


In model Q excluding the past essentially isolates the dynam¬ 
ical noise pf and our goal is now to quantify the information 
transfer emanating from rjf into the future (Eig.[^. 


With this central idea we define two pairs of measures for 
two purposes: ( 1 ) to quantify the information transfer be¬ 
tween two causally linked processes and along causal paths 
and (2) the mediation of intermediate processes. Eor each of 
these tasks we define two measures quantifying different no¬ 
tions of information transfer: Both have in common the above 
idea to extract information originating in process X only at 
the lagged time t — t and are conditioned in order to measure 
only information transfer along causal paths. These measures, 
thus, complement alternative decomposition approaches such 
as in Refs. (SI Ell [391. The second measure further attempts 
to exclude the influence of other drivers of Y or intermediate 
path nodes to isolate the whole causal information pathway 
and fulfill a generalized property of coupling strength auton¬ 
omy as proposed in previous work ||48]| . In the present con¬ 
text the property of coupling strength autonomy demands that 
the measure should be uniquely determined by the interac¬ 
tion of the two processes, X, Y in the previous example and 
possibly intermediate other processes W, alone and in a way 
autonomous of how these are driven by the remaining pro¬ 
cesses. To understand this, consider a simple example: Sup¬ 
pose we have two interacting processes X and Y and a third 
process Z, that drives both of them. Then a bivariate measure 
of coupling strength between X and Y such as MI will be in¬ 
fluenced by the common input of Z, while our demand is, that 
the measure should be autonomous of the interactions of X 
and Y with Z. 


In summary, this paper generalizes the idea underlying 
Ref li48l to use the reconstructed causal network for quan¬ 
tifying general causal interactions. This framework is called 
the Tigramite approach {lime series graph based Measures 
of Information Transfer), which is also the abbreviation of 
the accompanying software package (available on the author’s 
website). Table|^gives an overview over different ways to use 
the time series graph for defining causal information transfer 
measures. 


Pearl UU defines the causal effect of X on K by the hy¬ 
pothetical intervention of experimentally setting a variable X 
to a certain value x. Then the post-interventional distribution 
P{Y = y I do{X = a;)), which involves the do-operator and 
is not the same as the conditional distribution, is used to as¬ 
sess whether and in what way X affects Y. As mentioned 
before, however, we assume a non-manipulable complex sys¬ 
tem and, therefore, study a weaker notion of causality. Erom 
observational data alone, causal effects can only be estimated 
(or identified) under certain assumptions about the underlying 


we discuss Pearl’s causal effect for linear models. 


process and the kind of interventions In Sect. VIA 
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(a) (b) 



Figure 2. (Color online) Venn diagrams of (a) mutual information, 
(b) conditional mutual information, (c) positive interaction informa¬ 
tion, and (d) negative interaction information. The latter case, where 
the entropies of X and Z do not ‘overlap’ anymore, demonstrates 
that the analogy between entropies and sets should not be overinter¬ 
preted. 


II. INFORMATION-THEORETIC PRELIMINARIES 

A. Conditional mutual information 

The most important information-theoretic measure on 
which the quantities discussed in this article are based is the 
conditional mutual information (CMI) given by 

I{X-Y\Z) 

= H{Y\Z) - H{Y\X, Z) = H{X\Z) - H{X\Y, Z) (2) 

= jp{z) JJ 

with Shannon’s entropy H ll43l l44l as a measure of the un¬ 
certainty about outcomes of a process. Mutual information 
(Ml), on the other hand is a measure of the reduction of this 
uncertainty if another process is measured and CMI can be 
phrased as the MI between X and Y that is not contained in 
a third variable Z. Here we use the natural logarithm to mea¬ 
sure CMI and derived measures in nats. Note that X, Y, 
and Z can also be vectors. Just like MI, CMI is non-negative 
(which can be shown using Jensen’s inequality ||4| and holds 
for the continuous as well as the discrete case) and symmetric 
in its first two arguments I{X] Y\Z) = I{Y; X\Z). Further, 
according to Eq. (j^, CMI measures the Kullback-Leibler dis¬ 
tance ElEol between the distributions p{x, y\z) and the dis¬ 
tribution for the independent case p{x\z)p{y\z) and is zero 
if and only if X and Y are independent conditionally on Z. 
This property makes CMI especially useful to measure con¬ 
ditional independence as needed in the definition and estima¬ 
tion of causal graphs (Sect.[nl|). Figures]^ a) and (b) visualize 
MI and CMI in Venn diagrams as a difference of conditional 


entropies. In this representation also the symmetry in the ar¬ 
guments is obvious. 

B. Interaction information 

Just like MI and CMI are differences of conditional en¬ 
tropies, also the difference of CMIs has an interesting inter¬ 
pretation that we will utilize to measure the effect of one ran¬ 
dom variable on the interaction between two others. Such a 
measure has been studied in Refs. EIHSa under the name 
multiple information. We use the term interaction informa¬ 
tion with the symbol X, which is symmetrically defined as 

X{X;Y-Z)=I{X;Y)-IiX;Y\Z) (4) 

= I(Y-,Z)-I{Y-Z\X) 

= I{Z;X)-I{Z-,X\Y). 

In Refs, misa this quantity is defined with the signs re¬ 
versed, but the above definition is more consistent with the 
definition of CMI in Eq. (j^. It is also straightforward to de¬ 
fine the conditional interaction information 

X{X- F; ZjW) = I{X- Y\W) - I{X- F|Z, IF). (5) 

Contrary to CMI, the (conditional) interaction information 
can also be negative and is bounded by 

- min(/(X; r|Z, IF), J(F; Z|X, W),I{Z- X\Y, W)) 
<I{X;Y;Z\W) 

< min(/(X; Y\W), I{Y; Z\W), I{Z- X\W)). (6) 

The possible negativity also shows that the visualization in 
Fig. I^c) as sets in Venn diagrams should not be overinter¬ 
preted. In Fig.|^d) a case is shown where X and Z are uncon¬ 
ditionally independent, but conditionally dependent leading to 
I{X; Z\Y) > I{X; Z) and, therefore, a negative interaction 
information. That this property can actually by intuitively un¬ 
derstood will be studied in examples in Sect.[V| 

C. Estimation of (conditional) mutual information 

In the examples and applications we use a nearest-neighbor 
estimator Il5^l57l that is most suitable for variables taking on 
a continuous range of values and has much less bias than the 
commonly used binning estimators. This estimator has as a 
free parameter the number of nearest-neighbors k which deter¬ 
mines the size of hyper-cubes around each (high-dimensional) 
sample point. Small values of k lead to a lower estimation bias 
but higher variance and vice versa. For independence tests, a 
higher k with lower variance is more important while for es¬ 
timates of the CMI value a smaller k is recommended. Note 
that for an estimation from (multivariate) time series station- 
arity is required. 
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III. TIME SERIES GRAPHS AND CAUSAL PATHS 

The here proposed framework to use the reconstructed 
causal network for quantifying general causal interactions 
(Tigrainite approach) is based on the concept of time series 
graphs and causal paths as defined in the following. 

A. Time series graphs 

A time series graph Il58ll59]l is a certain type of graphical 
model ||60l for the case of time-ordered data and visualizes 
the Markovian conditional independence properties of a mul¬ 
tivariate time-dependent process, i.e., how the joint density of 
the multivariate process X (including its lags) factorizes. Fig¬ 
ures |^a,b) show examples. Each node in a time series graph 
represents a subprocess of a multivariate discrete time process 
X at a certain time t. Directed links between subprocesses (or 
nodes) Xt-r and Yt for r > 0 are marked by an arrow and 
defined by 

Xt-r ^ Yt ^ I{Xt-r]Yt\Xt \ {Xt-r}) >0, (7) 

with infinite past Xj" = (Xt_i, X(_ 2 , ...), i.e., if they 
are not independent conditionally on the past of the whole 
process, which implies a lag-specific Granger causality with 
respect to X. If Y ^ X we say that the link Xt-r —> Yt 
represents a coupling or cross-link at lag r, while for Y = X 
it represents an autodependency or auto-link at lag r. 

Since often also contemporaneous associations are of inter¬ 
est, we also define links between Xt and Yt as in previous 
works Il^l48ll bv 

Xt-Yt ^ IiXt;Yt\Xi^MXuYt}) >0, (8) 

where also the contemporaneous present X(\{Xt,yt} is 
included in the condition. Note that stationarity implies that 
Xt-r Yt whenever Xt'-r Yt' for any t' and corre¬ 
spondingly for contemporaneous links. In Ref. Il59l also an¬ 
other version of contemporaneous links is defined, marked by 
a dashed line; 

Xt---Yt ^ I{XuYt\-Xi)>0. (9) 

In the case of a multivariate autoregressive process, the 
latter definition corresponds to non-zero entries in the co- 
variance matrix of the innovations, while the former corre¬ 
sponds to non-zero entries in the inverse covariance matrix 
15^ . One problem with Definition is that it can poten¬ 
tially cause spurious links if, e.g., Xt and Yt are independent 
(also of the past), but both causally drive another process Zt 
instantaneously, i.e., at the same time t, which might not be 
resolved due to a too coarse time sampling interval. Then 
I[Xt\Yt) — 0, but I{Xt]Yt I Zt) > 0 due to the ‘condition¬ 
ing on a common child’ effect, see e.g. iMl , which is shown in 
Fig. I^d). In this work, we are not considering instantaneous 


causal effects, but to circumvent this problem in practice, one 
can consider contemporaneous effects only if both Definitions 
0 and ([^ are satisfied. Note that both definitions result in 
slight differences in the definition of open and blocked paths 
through contemporaneous links as discussed further below. 

In Refs. EOl |62l a consistent algorithm for the estimation 
of the above-defined time series graphs by iteratively infer¬ 
ring the parents and, in a second step, also the neighbors is 
discussed. This challenging problem is not further addressed 
here and involves demands such as consistency (i.e., that 
the algorithm converges to the true graph for infinite sample 
sizes), statistical power, underlying assumptions (e.g., faith¬ 
fulness CSl), or computational complexity (partly addressed 
in Ref. 163]). 

B. Causal paths 

The measures introduced in Sect. |IV] are CMIs based on 
paths and different sets of conditions which we determine 
from the sets of parents and neighbors of a node Y) defined, 
respectively, as 

Vy, = [Zt-r : ZgX,t> 0, Zt-r ^ Yt} , (10) 

X'Y,={Xt:XGX, Xt-Yt}. (11) 

Our main interest lies in causal paths in the time series 
graph which are defined as directed paths, i.e., containing only 
motifs —> • —> (assuming that the arrow of time in the time 
series graph goes to the right). But there are also other paths 
on which information is shared even though no causal inter¬ 
ventions could ‘travel’ along these. In general l59]| . in the 
above defined time series graph with solid contemporaneous 
links a path between two nodes u and v is called open if it con¬ 
tains only the motifs —> • —^ • —>^, or — • —. 

On the other hand, if any motif on a path is —^ or —, 
the path is blocked. Nodes in such motifs are also called col¬ 
liders. If we now consider a separating or conditioning set 
S, openness and blockedness of conditioned motifs reverse, 
i.e., denoting a conditioned node by ■, the motifs —>■—>, 
■i— ■ —— ■ —>^, and — ■ — are blocked and the motifs 
—> ■ ^ and —— become open. Note that for the al¬ 
ternative definition of contemporaneous links Eq. Q marked 

with dashed lines, the motif-•-is blocked while the 

conditioned motif-■-is open. 

Two nodes u and v are separated given a set S if all paths 
between the two are blocked. Conversely, two nodes are con¬ 
nected given a set S if at least one path between the two is 
open. The Markov property, which we assume throughout, 
now relates separation in the time series graph to conditional 
independence relations in the underlying process which can 
be quantified with CMI (as a conditional independence mea¬ 
sure); 

u and u separated given 5 =4> /(u;u|5)=0. (12) 

The path-based CMIs are constructed with conditions to 
block all non-causal paths and only leave open causal paths. In 
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Granger causality / TE - type 

Sims causality - type 

Causal information pathways - type 

Conditioned on 

Transfer measures 

Interaction measures 

parents of target process Y 

(D)TE (not lag-specific), ITY 

parents and neighbors of 
source process X 

ITX 

IIX 

parents and neighbors of source and par¬ 
ents of all pathway variables 

MIT (causal links only), MITP 

Mil 


Table I. Three different types of time series graph-based measures of information transfer (Tigramite approach). Transfer measures refer 
to CMI-based quantities to measure information transfer between two variables, interaction measures to the interaction information-based 
quantities between multiple variables. Decomposed transfer entropy (DTE) was introduced in Ref. l20l . 


particular, also contemporaneous sidepaths, which start with 
one or more contemporaneous links followed by a directed 
path u— • — ••• —i'V, need to be blocked. Note 

that we do not consider contemporaneous causal effects here 
which might occur due to a too low sampling rate of the pro¬ 
cess. 

IV. TIME SERIES GRAPH BASED MEASURES OE 
INEORMATION TRANSEER (TIGRAMITE APPROACH) 

In the following we briefly discuss the transfer entropy 
ansatz to measuring information transfer and introduce our 
novel approach to quantify different aspects of information 
transfer through causal links and paths. Table provides an 
overview over these different classes of measures. As men¬ 
tioned in the introduction, the proposed measures of infor¬ 
mation transfer are CMIs based on different sets of condi¬ 
tions which we determine from the reconstructed time series 
graph. The Tigramite approach has the advantage of a low¬ 
dimensional estimation problem without arbitrary truncation 
parameters like in the original dehnition of transfer entropy 
involving inhnite vectors. 


I^lZvir) = I{Xt-r-, YtlVrMXt-r}) (14) 

ITY is different from a bivariate lag-specihc TE definition 
such as in Ref. 1661 since it explicitly uses the previously re¬ 
constructed parents Vyt C X~, which includes drivers from 
the past of the whole process and not only F’s own past. 

TE can be derived as one component of decomposing the 
prediction entropy /(X^; Y)) 1^ . A similar approach is de¬ 
veloped in Ref. E). The decisive difference of these transfer 
entropy related measures to our proposed framework is that 
they measure the contribution of different drivers to predict¬ 
ing a target variable Y, i.e., they are aimed at decomposing the 
predictive information. In particular, Granger causality, TE or 
ITY are zero for indirect causal interactions, i.e., if the inter¬ 
action is mediated via another measured process. With respect 
to time series graphs, ITY is one way to quantify the strength 
of a causal coupling link between X and Y at some lag t. Eor 
a detailed account on the interpretability of different measures 
of the strength of causal links see Ref. 14^ . 


B. Quantifying information transfer along paths 


A. Transfer entropy ansatz 

Transfer entropy (TE), introduced by Schreiber 1^ . is 
the information-theoretic analogue of Granger causality and 
for multivariate Gaussian processes they can be shown to be 
equivalent ll^ . The key idea to arrive at a causal notion of 
information transfer is to measure the information content of 
the past of a process X at times t' <t about the target variable 
Y at time t and exclude information from the common history 
shared by X and Y. In its multivariate version, TE is dehned 
as 


In this article the main question of interest is not only how 
strong a causal link is, but more generally how strong an in¬ 
direct causal influence of a variable Xt-r on Yt is (Fig. [^. 
Indirect causal effects can only be transferred on causal paths 
in the time series graph, which are paths consisting only of di¬ 
rected links as defined in Sect. IIIB| Note that Fig.j^c) shows 
an aggregated process graph, which is not suited to read off 
causal paths since it does not show the full spatio-temporal 
causal structure (including autodependencies) like time series 
graphs. 

We denote the processes along causal paths including Xt_T 
for r > 0 and excluding Yt by 


ll^_^y=I{Xf-Yt\Xf\Xf). (13) 

TE measures the aggregated influence of X at all past lags, 
i.e., it is not lag-specific, and leads to the problem that infinite- 
dimensional densities have to be estimated, which is com¬ 
monly called the “curse of dimensionality”. In Ref. Il20ll this 
problem is overcome by a decomposition formula. In prac¬ 
tice, however, a truncated version at some maximal delay is 
typically used. In Ref. HSi a lag-specific variant of TE taking 
into account the time series graph structure was introduced, 
called the information transfer to Y (ITY) defined as 


Cxt-.r^Yt ={-Yt_^}U 

{Wt-rt, e X,- with r > Tw > 0 : 

Xt-T Wt-Tw —^ ■ ■ • — Yt} , (15) 

where —^ denotes a succession of directed links or 

only one directed link. These can be read off directly from the 
time series graph. For example, in Fig. 0 Xt -3 and Yt are 
connected by the three causal paths Xts —t W 2 ^t-i —t Yt, 
Xt -3 Wi^t -2 Yt, and Xt -3 
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(a) ITX 


(b) MITP 


t-5 t-4 t-3 t- 2 i-1 t 


t-5 t-4 t-3 t-2 t-1 


t 



(c) Process graph 



Figure 3. (Color online) Time series graphs illustrating the path-based measures of information transfer ITX (a) and MITP (b), and process 
graph (c, the labels denote the lags). Directed links (Def. |7| are marked by arrows, contemporaneous links (Def. by a solid line. There 
are three causal directed paths connecting Xts and Yt (black lines), two of length 2 via Wi^t -2 and W 2 ,t-i and one of length 3: Xt -3 —t 
Wi^t -2 —t W 2 ,t-i —t Yt. The idea of the measure ITX shown in (a) here is to quantify how much of the information entering the system 
in Xt- 3 , i-e., the dynamical noise 77 ^, is transferred along causal paths to Yt by conditioning out the effect of the parents of 'Px^_^ (solid 
red boxes), its neighbors involving contemporaneous sidepaths to Yt denoted (dashed red box), and the neighbor’s parents V{X'xl ) 

(dotted red boxes). The latter two conditioning sets exclude contemporaneous sidepaths like Xt -3 — FFi,t-3 —> W 2 ,t -2 —> Tj-i —>■ Yt. ITX 
still depends on processes affecting intermediate nodes on causal paths, e.g., process Z 3 which drives Wi and Y. The idea of MITP shown 
in (b) now is to go one step further and isolate all causal paths from the remaining process by additionally conditioning on the parents of the 
intermediate path nodes Cxt-^-i.Yt\{Xt-T} (dashed blue boxes) and Y (solid blue boxes). This also allows to isolate mediated effects using 
momentary interaction information as defined in Sect.|IV C] 


Yt such that Cxt_ 3 ^Yt = {Xts, Wi^t- 2 , W 2 ^t-i}- Ourgoal 
is now to construct a CMI with conditions that leave open only 
these causal paths and block all non-causal paths according to 
the definition of paths and blocking in time series graphs in 

Sect. mra 


The first step is to exclude paths due to common drivers of 
X and Y. The parents Vxt-^ of X at time t — t block all 
common drivers from the past since these paths necessarily 
contain the motifs — • ^ Xt-r or —> • — Xt-r, which are 
both blocked if conditioned on. A second class of non-causal 


paths are contemporaneous sidepaths as defined in Sect. Ill B 


These can be blocked by conditioning on those contemporane¬ 
ous neighbors of Xt-r that have at least one contemporaneous 
sidepath, of course not traversing Xt-r, which we define as 

= [Wt-r e Mxt-^ : Xt-r - Wt-r ^ . . . ^ Yj , 

( 16 ) 

where 7? ... —> denotes either a directed path or a contem¬ 
poraneous sidepath that does not involve Xt-r. For example, 
in Fig. j^a,b), Afx *_3 = {Wi^t-s}- On the other hand, for 
the causal path Xt -2 —>■ Xt-i —>■ Xt we have X'x {-2 ~ 
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since there are no contemporaneous sidepaths from Wi^t -2 to 
Xf. The condition on neighbors unfortunately introduces new 
open paths because Xt_T — ■ ^ is an open motif. To block 
these paths, one needs to additionally condition on the parents 
of the neighbors ViNj^ _ ). Note that one could also only 
select those parents from Xt-r which have a ‘common driver 
path’ to Yt, but our goal is to isolate the momentary informa¬ 
tion entering the system in X, i.e., the dynamical noise from 
model and quantify its propagation along causal paths to 
Y some time later. The information transfer from X (ITX) is 
now defined for t > 0 as 


answer this question, we here discuss two analogous versions 
for the measures ITX and MITP. For two processes Xt-r and 
Yt connected by a causal path, intermediate processes can oc¬ 
cur with multiple lags. For example, among the causal paths 
between Xt -4 and Yt in Fig. the process Wi is traversed 
at lags Wi^t -2 and Wi^t- 3 - Generally, if a subprocess W is 
intermediate in an interaction Xt-r —• • • —Ft at multiple 
lags i — Ti, t — r 2 ,..., we here include all these lags in the 
vector W = {Wt-ri,Wt-r 2 , ■ ■ •} C Cxt_r^Yf 

First, we define the interaction information from X (IIX) as 


rITX 


I{Xt-r-,Yt\Px,-r.K\ 


Yt 


.-Pi^fx 


.))■ 


(17) 


It measures the part of source entropy in Xt-r that reaches 
Yt on any causal path and could be regarded as an information- 
theoretic analogue to Sims causality as mentioned in the in¬ 
troduction (see also Table [I|. In Ref. If48l this measure was 
introduced without the condition on neighbors. 

ITX does not exclude information entering process Yt from 
other sources, for example from process Z 3 in the example 
shown in Fig.|^a). The idea of momentary information trans¬ 
fer 14^ was to isolate the information shared between two 
processes via a causal link from the remaining process. Now 
this idea can be generalized by isolating all causal paths from 
the remaining process to assess the part of the source entropy 
of Xt-r that is transferred on any causal path and shared with 
Yt, excluding the parents of all intermediate path nodes and 
Y that are not part of the causal path. Figure |^b) illustrates 
this idea. With the nodes on all causal paths including Xt-r 
denoted by Cxt-r^Yt (EQ- <|15|i), the momentary information 
transfer along causal paths (MITP) is defined as 


= I{Xt-r-,Yt I VYt\Cxt.r^Yt,V{Cxt.r^Yt), 

( 18 ) 

For the time series graph example in Fig. [^b), these con¬ 
ditions are marked by the red and blue boxes. In Sect. [V]we 
will prove that MITP, contrary to ITX, also fulfills a gener¬ 
alized coupling strength autonomy theorem which allows to 
better relate it to the underlying dynamics of a process as will 
be discussed in Sect.lVIl 

If Cxt-r^Yt = {Xt-r}, and under the “no sidepath”- 
constraint in Ref. ill, the conditions on the neighbors can 
be dropped and MITP collapses to MIT. 


C. Quantifying mediating information transfer 


7-1IX 

-^Xt-r^Yt\W 


= IiXt-r-,Yt;W \ Vxt-r,^x\.r^ n^xl.r)) 


(19) 


= mvir) - I{Xt-r,Yt\VXt.rMx\-r^ ' 


ITX conditioned on W 


( 20 ) 


IIX measures the effect of an intermediate process W on 
the information transfer between the source information of 
Xt-r and Yt. Second, the momentary interaction information 
(Mil) for an intermediate process W is defined as 


jMII / N 
-^X^YIWV ) 

= I{Xt-r;Yf,W I VYt\Cxt-r^Yt,V{Cxt.r^Yt), 

A/lU, (21) 

= - I{Xt-r-,Yt I VYt\Cxt-r^Yt,V{Cxt-r^Yt), 




MITP conditioned on W 


( 22 ) 


Mil measures the effect of W on the momentary informa¬ 
tion transfer along paths between Xt-r and Yt and addition¬ 
ally isolates the influence of drivers of the causal path pro¬ 
cesses. In Section|V]we discuss several examples demonstrat¬ 
ing that IIX and Mil are not necessarily always positive im¬ 
plying that an intermediate process can counteract the interac¬ 
tion between Xt-r and Yt- This measure can naturally be ex¬ 
tended by including sets of processes from Cxt-r^Yt ■ Due to 
the symmetry of interaction information as defined in Eq. 

Mil is symmetric in its arguments excluding the condition. 

Table provides an overview over the different classes 
of measures discussed here. In a climate data example in 
Sect. |VII[ we will see how IIX and Mil can be used to quan¬ 


tify dominant pathway mechanisms and in Sect. VIC we dis¬ 
cuss how they can be used as an aggregate measure of ‘causal 
interaction betweenness’, modifying concepts from complex 
network theory for functional network analysis m. 


Looking at Fig. one immediate question is whether one 
can quantify how much of the information transfer between X 
and Y went through Wi and how much through W 2 ? Which 
of these is information-theoretic ally more important for ex¬ 
plaining the indirect causal relationship between X and Y1 
The interaction information defined in Eq. 0 can be used to 


V. EXAMPLES AND THEOREMS 

In the following we discuss how the novel approach al¬ 
lows to extract a detailed picture of interaction mechanisms 
between multiple processes. 
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A. Linear model example 

In Ref. Il48l the strength of direct causal links was studied. 
The main finding was that MIT solely depends on the coef- 
hcient corresponding to the causal link. This property was 
called coupling strength autonomy in Ref. HSl and will be re¬ 
viewed in Sect. |V C| For the case of interactions along causal 
paths, consider the following linear model with time series 
graph visualized in Fig.^a); 

Xt = aXt-i + rj^ 

Wt = aWt-i + aXt-i + pf 

Yt = aYt-i + cXt-2 + bWt-i + vY , (23) 

where all processes are jointly zero-mean Gaussian with 
variances tr^, cry, cr| of the innovation terms 77 '. Here the 
influence of Xt _2 on Yt has two paths: One via the direct 
coupling link Xt -2 —t Yt and one via the path Xt -2 —> 
Wt-i —t Yt such that we can rewrite 


‘masked’ by the counteracting sidepath via W. In Ref. IIJTII 
a similar case, but without isolating the interaction pathway, 
was termed a “synergistic” contribution to the predictive in¬ 
formation about Y as opposed to the “redundant” case with a 
positive interaction information. 

In Fig. Qc) the dependence of the four measures for a = 
6 = 0.5 and varying the autodependency strength a and direct 
link strength c is shown. ITX features a strong dependency 
on a already for weak drivings a ~ 0.4 and almost vanishes 
for a very strong driving. Note that the same effect would be 
observed if other external processes drive W and Y (from X 
the effect is partially excluded due to the condition on Px)- 
Analytically, here ITX can only be reduced to 

= 2) = J (Xt. 2 ; YtlXt-s) 

= I (^aXt-3 + T]f_2;Yt\Xt-3) 
■—'= I {r]t_2;Yt\Xt-3) , (25) 


Yt — cXt-2 + b{aXt-2 + + Vt 


(24) 


from which we see that the coupling cannot be unam¬ 
biguously related to one coefficient and interesting dynamics 
emerge. In Fig.Qb) we investigate the measures ITX, MITP, 
IIX, and Mil numerically for varying a — b (strength of the 
sidepath) and c (strength of direct link) for hxed autodepen¬ 
dency strength a = 0.5. We assume a, 6 7 ^ 0, because other¬ 
wise this causal path vanishes and IIX or Mil are not dehned. 
The ensemble size to estimate the ensemble mean is 30, the 
sample length is T = 10,000, and the CMI nearest-neighbor 
estimation parameter is fc = 1 to achieve minimal bias EH. 


As mentioned in Sect. IIC for larger k the bias increases, but 
also the estimator’s variance decreases El making higher k 
values a better choice for independence tests as used in the 
causal algorithm ll 20 ll . 

Since we vary a together with b, the contribution via this 
sidepath is always positive, also for negative a, b. If also c 
is positive, we observe an increase in ITX as well as MITP 
(Fig. I^b)), with the latter being more pronounced. For neg¬ 
ative c, on the other hand, the contributions of the direct link 
and the sidepath counteract and, for certain values (a, b, c) 
even cancel out leading to a vanishing ITX and MITP. 

These different types of mediation of the intermediate pro¬ 
cess W can be quantihed by IIX and Mil (lower panels in 
Fig. I^b)): For positive c, both are larger than zero, showing 
the positive contribution of both mechanisms, also here Mil is 
more pronounced. For c = 0, Mil is equal to MITP because 
the only interaction stems from the causal path demonstrating 
the explanatory influence of W, which acts as the only mediat¬ 
ing process. In the Venn diagram of Fig.|^c) this corresponds 
to the case in which H{W) entails all of the shared entropy 
between X and Y. For negative c, the counteracting effect is 
evident in the negative sign of IIX and Mil which implies for 
the latter that I^;'^yny(T = 2) > /;^:[Jy(r = 2): Condi¬ 
tioning out the effect of the intermediate process W here re¬ 
veals that the direct link is actually very strong and was only 


which still depends on many coefficients in the model and 
cannot be easily related to the underlying dynamics. On the 
other hand, MITP can be simply related to the coefficients 
along the causal paths as 


rMITP 


("T = 2) = - In 



(c -f ab^ax \ 
b'^CT^ + <7y / ’ 


(26) 


which follows from Theorem in Sect. IV Cl Here it be¬ 
comes evident that MITP vanishes along the parabola c = 
—a b (which can be considered a pathological case where the 
causal assumption of faithfulness is violated ifTSl l. A second 
important hnding is that MITP is independent of the autode¬ 
pendency coefficient a. The same holds for Mil, here given 
by 


7 -MII _ 

■^X^Y\W ~ 


In 1-F 


(c -f ahYa\ 

b’^cr'w + ^y 


-iln 


1 + 


'x'^w 


(cr^ -f aVDcr 



(27) 


as follows from Appendix |A4| This implies that the value 
of MITP and Mil can solely be related to the model’s coeffi¬ 
cients along the causal interaction paths, which can be consid¬ 
ered an advantage in interpreting these measures compared to 
ITX or IIX. While in this example there are no external par¬ 
ents influencing the processes along the path, in more com¬ 
plex schemes also their effect can be excluded by the con¬ 
dition on the parents of the nodes on the path denoted by 
Cxt-T->yf Sect. ISlthis will be proven for the general case. 
Note that in Fig. MITP and Mil are slightly affected for 
very strong autodependencies which is due to an estimation 
bias and vanishes for inhnite sample sizes. This model will be 
further discussed in relation to linear causal effect measures 
in Sect.lWA] 
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(a) Time series graph 


t-3 t-2 t- 1 t 




Figure 4. (Color online) (a) Time series graph for model j23[ l of a causal interaction between three processes at different lags (black dots). The 
parents are shown in colored boxes, here there are no neighbors, a, b, c, a denote the model coefficients. In (b) the interaction measures are 
plotted against a = b (strength of the sidepath) and c (strength of direct link) for an autodependency strength a = 0.5, and in (c) against c and 
the autodependency strength a for a = 6 = 0.5. The color shading only emphasizes the sign and strength, the value can be read off the z-axis. 
All innovation terms r; have unit variance. Further parameters: ensemble size 30, sample length T = 10, 000, nearest-neighbor estimation 
parameter k = 1. 


B. Nonlinear model example 


Next, we discuss a nonlinear version of model ( |2^ which 
shares the same time series graph, but features different dy¬ 


Xt — aXt-i + r]^ 

Wt = aWt-i + aXt-i + 

Yt=aYt.i+ cbXt-2Wt-i , ( 28 ) 


namics: 


multiplicative dependency 





11 


a) ITX 


MITP 




b) ITX 


MITP 




Mil 



Figure 5. (Color online) Same as in Fig.|^ but for the nonlinear model j28|>. 


with Gaussian innovation terms as before. Figure [^a) 
shows that ITX and MITP vanish for & or c equal zero and are 
increasing for larger absolute values. For larger |c| and certain 
values of a, b we observe a counteracting of W through the in¬ 
direct path as can be seen from the negative IIX and Mil, but 
no annihilation of both effects occurs here and ITX and MITP 
stay positive. For this nonlinear dependency structure both 
ITX and MITP (and the corresponding interaction informa¬ 
tions) depend on the external forcing parameter a (Fig.|^b)). 
The reason is that the nonlinearity mixes the terms and the 
dependencies cannot be conditioned out anymore. Consider 
model ( [28] l, but with differing autodependency terms a, j3, 7 
for X, IF, Y, respectively. MITP here is given by 

/M^P(r = 2) = I{Xt-2]Yt\Xt_^,Wt-2,Yt_^) 

ITt_2, Xt-i) , 

(29) 

and the dependency of 1) can be rewritten as 


Yt =cb{a'n^_2 + r]Y-i)r]t-2 + vT 

+ cb{aaXt-z-q^_2 + aXtsij^i 
+ l3Wt-2vf-2 + aaXt-3'n^_2) 

+ -iYt-i + cb{al3Xt-3Wt-2 + aa^Xj_^). (30) 

Here in MITP the last line vanishes due to the condition 
on {Xt- 3 , Wt- 2 , Yt-i), but due to the multiplicative mix¬ 
ing with the noise terms in the second and third line, the 
autodependency coefficients a, /? (but not 7) still determine 
MITP. ITX additionally depends on 7. This model, there¬ 
fore, demonstrates a case where ‘external effects’ cannot be 
excluded anymore. Thus, while the information-theoretic in¬ 
terpretation still holds, MITP cannot be easily related to the 
system’s dynamics. Still, plots like in Figs. BE can help to 
better understand dynamical interactions also in toy models 
from nonlinear dynamics. In the next section we prove under 
which general assumptions the coupling strength autonomy 
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holds for MITP and MIL The multiplicative dependency can 
be seen as an example of synergy which has recently gained 
a lot of interest in information-theoretical studies, see e.g. 
Refs. Il67ll68l . In Ref. 1^ synergistic effects are studied with 
respect to optimal prediction schemes. 


Yt are connected by a directed path with path nodes Cxt-T^Yt 
including Xt-r os defined in Eq. VTe denote those par¬ 
ents ofYt that are in the path nodes as Vy = 'Pvt 
and correspondingly for other path nodes and assume the fol¬ 
lowing dependencies: 


C. Theorems 

In this section we state some inequality relations among the 
novel measures and generalize the coupling strength auton¬ 
omy theorem for MIT HSl to the path-based measures MITP 
and MIL 

Theorem 1 (Inequality relations). For t > 0, the following 
inequalities hold: 


tIIX f^\ / rITX 

-Lx^y\wv) — 

tMII (^\ / rMITP/ ^ 

-Lx^Y\Ww) — ^X^YV) 

rITX / rMITP/ 

^X^YV) X Ix^Yv) ■ 


(31) 

(32) 

(33) 


The first two inequalities are trivially fulfilled since IIX and 
Mil are defined by ITX and MITP minus a CMI, which is al¬ 
ways positive. Equality holds if the intermediate node(s) W 
explain the entire interaction between X and Y. The last in¬ 
equality is proven in appendix |A 1| In practice, this inequal¬ 
ity is often not fulfilled because the estimation dimension of 
MITP is typically much larger than that of ITX and finite sam¬ 
ple effects lead to a negative bias which often leads to MITP 
being smaller than ITX. This also makes a comparison of the 
values of ITX and MITP more difficult. 

To generalize the coupling strength autonomy theorem 
from MIT to MITP and Mil, we consider causal paths as de¬ 
fined in Sect. |III Bj instead of only causal links. While the care¬ 
ful condition on only those neighbors that have sidepaths ex¬ 
cludes dependencies of MITP and Mil on the dynamics along 
these sidepaths, one cannot avoid a contemporaneous depen¬ 
dency on the interaction with the respective neighbor itself. 
This also holds for other intermediate processes on causal 
paths. For the following theorems, we define as a “no con¬ 
temporaneous dependency”-condition 


V G : AT 


rYt 


(34) 


with 


w. 


CO 

t — Tj 


defined in Eq. 


no contemporaneous sidepaths as defined in Sect. IIIB 


16 1 . This condition implies that 


anate from any of the path nodes Cxt_^^Yt (including Xt-r) 

(i) 

towards Yt. Note that we denote by each individual 

subprocess along causal paths at a certain lag r^. If one sub¬ 
process occurs at multiple lags, it will have another index i for 
each lag. 


Theorem 2 (Coupling strength autonomy for MITP). Let 
X, Y be two subprocesses of a multivariate stationary 
discrete-time process X sufficing the Markov property 
(Eq. with time series graph Q. Yfe assume that Xt-r and 


Xt = gxiVxt-.,) + 

Yt = h{VY) + 9y{Vy, \ Vf) + pf, (35) 

where /y is linear and gx.Y arbitrary. Further, for all path 
nodes we assume the dependencies 

=/z(Pf)+5.(n\Pf)+77j 

V G Cx,_^^Y, \ {Xt-r}, (36) 

where the f are again linear, the gt are arbitrary functions 
and the dynamical noise terms tj' are i.i.d. due to Markovity. 
Then, MITP (Eq. @) is given by 


rMITP 

-'X-J-Y 


(t) = 


^iVt-T , Vt + fiVt-TT^iVt-Ti) 

I VYt\Cxt-r^Yt,'P{Cxt-r^Yt),-Xx\_^, 

(37) 


where 0 < Tt < tM i. If further the “no contemporaneous 
dependency”-condition (0 holds, MITP reduces to a mutual 
information 


= I{gl^ ; gY + fivf-r, (38) 

where f is a linear function and denotes the innova¬ 
tion terms or dynamical noise of all path nodes in Cxt- -^Yt \ 
[Xt-r). 

The proof is given in Appendix ]A 3| This theorem also in¬ 
cludes the coupling strength autonomy theorem for MIT ||48]| 
as a special case if Cxt-r^Yt = {Xt-r} and under the “no 
sidepath”-constraint in Ref. HSl . then ) = 

fivf-r)- 

Since momentary interaction information (Mil) is the dif¬ 
ference between MITP and the MITP conditioned on one of 
the path nodes (excluding Xt-r), the theorem follows from 
the above theorem. 

Theorem 3 (Coupling strength autonomy for Mil). Using the 
same assumptions as for Theorem^ the momentary interac¬ 
tion information Tx^Y|w(’'’) between Xt-r, Yt and one or 

more intermediate processes W = {Wt^^, Wt^.,.^ ...) G 
Cxt-r^Yt \ {Xt-r} indexed by j reduces to 

T [g^-r] gl + f{gt-r, U*pJ_r.); 

[tft-r, + f3iP^-r,^i^jit-rS\ . 

I 7^yt\Cxt_,-s.Yt,7^(Cxt_,^Yj,A/'J*_^, 'PWx'tS) ’ 

(39) 
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and, if further the "no contemporaneous dependency”- 
condition ( |34P holds, to 

, (40) 

for linear functions f, fj. 

The proof is given in Appendix ]A 4| For the case of a causal 
triple as shown in Fig.j^this further reduces to 

I {vf-T, vY + (c + ab)pY-r + br]Y^r„;iiZry^ + ^vY-Y) , 

(41) 

from which the special case with Gaussian innovations 
Eq. pT] ) follows. 


VI. DISCUSSION 

A. Relation to linear causal effect theory 

We phrased the idea of causal influence in an information- 
theoretic setting. Pearl’s theory of causal effects GSlIIll can 
also be embedded in the time series graph framework ll49l . 
Assuming the time series graph is causally sufficient ifTOll and 
all dependencies are linear, causal effects can simply by de¬ 
rived from multivariate regressions. Firstly, in analogy to ITY 
or MIT as a measure of direct link strength, the path coeffi¬ 
cient of a link is given by the corresponding (typically stan¬ 
dardized) coefficient in a multivariate regression of each pro¬ 
cess on its parents in the time series graph 1^ . Further, in 
analogy to ITX or MITP, the linear causal effect of Xt-r on Yt 
also via indirect paths can be estimated by a standardized re¬ 
gression of Y on the multiple regressors {Xt_T, 'P{Xt_T)}. 
The linear Causal Effect (CE) ||T9l [21] is then given by the 
corresponding (standardized) regression coefficient r belong¬ 
ing to Xt-T, 


CEx^F('r) = rYtXt_^-v{Xt-C ■ 


(42) 


This formulation assumes the “no contemporaneous 
dependency”-condition (34i for simplicity, but it can be 
generalized. The causal effect CEx-).y('r) quantifies the 
change in the expectation of Yj (in units of its standard devi¬ 
ation) induced by raising the lagged Xt-r by one standard 
deviation, while keeping the parents of Xt-r constant. Then 
the total causal effect between lagged processes is simply 
given by the sum over the product of path coefficients along 
each causal paths connecting Xt-r and Yt. Eor example, 
for the model ( |23| with time series graph in Eig. Qa) the 
total linear causal effect between Xt -2 and Yt is given by 

-f ab), where the square root contains the normaliza¬ 
tion by the standard deviations which, however, depends on 
the autodependency strength and other coefficients here. ITX 


is simply the mutual information with the same conditions as 
CE (if no neighbors are present), while MITP for this model 

example (see Eq. (38 i or Eq. (261) is | In (l -f ■ 


ITX, MITP and CE^ 


I depend 


on the ‘coupling mechanism’ 


(c -f ab), but with different ‘normalizations’. 

Even in linear models, the Mediated Causal Effect (MCE) 
is more difficult to identify lfT9ll70ll . The causal interpretation 
is that an indirect effect via the node(s) W measures the in¬ 
crease we would see in Y) while holding Xt-r and all other in¬ 
termediate nodes and parents of Xt-r constant and increasing 
the node(s) W to whatever value it would obtain under a unit 
change in Xt-r while holding the parents of Xt-r constant 
naEoi. To identify MCE for the triplet case in model ( |2^ 
with time series graph in Eig.j^a) one can subtract from CE 
the contribution of all paths not passing through W: 


MCFix^Y\w{'f = 2 ) 

= CEx-s.y(2) - rYtXt-^-v{Xt- 2 ),Wt-t,v(Wt-i) 

' -V-^ 

CE excluding paths through W 



Note the additional condition on the parents of W here 
needed to exclude a confounding of the mediating link from 
W to Y from the past due to Wt- 2 - This is also the idea 
behind the interaction information Mil which is conditioned 
on the parents of all intermediate processes to exclude pos¬ 
sible confounding. Mil is given also by a difference, but 

of CMIs instead of regressions: | In ^1 -|- ^ — 

( 2 2 2 \ 

1 + / 2^ 2 1, where the latter term information- 

theoretically quantifies the strength of the direct link with co¬ 
efficient c. The linear framework allows for quantifying the 
relative influence of paths between two processes by the ‘lo¬ 
cally’ estimated weights making it easy to interpret, but it rests 
on a linear assumption. Another advantage of the linear ap¬ 
proach is that total and indirect effects can also be investigated 
in the frequency domain in the framework of directed trans¬ 
fer functions Il25}l2l- To some extent causal effects can also 
be estimated for more general nonlinear structural equation 
models nsiiTii, but especially mediated effects are difficult 
to identify if no strong assumptions are fulfilled ifTOl . 


B. Advantages and limitations of coupling strength autonomy 

MIT, MITP and Mil somewhat disentangle the coupling 
structure, which is exactly the coupling strength autonomy 
that makes these measures well-interpretable as measures that 
solely depend on the “coupling mechanism” between Xt-r 
and Yt (and possibly intermediate processes) as shown in the 
previous sections, autonomous of other external processes. 
One such possible misleading input “filtered out” is autocorre¬ 
lation, or, more generally, autodependency as has been shown 
in the model examples. This interpretability is facilitated by 
the careful conditioning on all possible confounding processes 
which can be determined from the time series graph (assuming 
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the graph entails all relevant processes, i.e., causal sufficiency 
113). In a way, coupling strength autonomy is an information- 
theoretic description similar to the identifiability of causal ef¬ 
fects in Pearl’s framework, but this connection needs to be 
further investigated. 

However, the assumptions allowing for such an inter- 
pretability are quite restrictive; While arbitrary additive func¬ 
tional dependencies of the interaction processes on external 
drivers can be conditioned out, the whole interaction mech¬ 
anism from X to Y via intermediate processes needs to be 
linear. Note that this does not imply that linear measures can 
be used instead, because these would not exclude arbitrary 
nonlinear external drivers. A further complication is that the 
potentially high dimensionality due to many external drivers 
leads to a strong bias in MITP and Mil for smaller sample 
sizes, even for the most advanced information-theoretic esti¬ 
mators employed here Il56ll57l . These limitations hamper the 
added value in interpretability of MITP and Mil compared 
to ITX / IIX. But if no detailed knowledge of the dynami¬ 
cal equations are given, this approach at least is rigorously 
based on the time series graph encoding the Markovian con¬ 
ditional independence structure as an abstraction of the dy¬ 
namics. Also if the equations are known, but feature highly 
complex chaotic behavior like toy models from nonlinear dy¬ 
namics, plots of the measures introduced here like in Figs. 00 
can help to better understand information transfer in dynami¬ 
cal interactions. 


C. Information transfer and complex network theory 

In the literature of neuroscience miTiiiTSi and recently 
also in climate research EllTll, multivariate datasets are of¬ 
ten analyzed using pairwise association measures combined 
with complex network theory ca. Networks are typically 
reconstructed by thresholding the association matrix (either 
by some predefined threshold or such that a fixed link den¬ 
sity is obtained). In interpreting such networks, it is important 
to take into account the aspect that the network comes from 
only pairwise associations. For example, the basic principle 
of transitivity of correlation leads to a lot of spurious links 
strongly affecting network measures such as the average path 
length. Typically, short-path lengths in these networks are 
related to the global efficiency of information transfer, e.g., 
in the brain JTl, but also in climate 0. But the authors in 
Refs. Il76ll77l have shown that even for a set of entirely inde¬ 
pendent processes a small world topology (i.e., small average 
path length and high clustering of the network) emerges. Fur¬ 
ther, the robustness of a system to random error or perturba¬ 
tions is typically associated with a high clustering coefficient. 
Also this measure can lead to false interpretations if causality 
is not taken into account; For example, for the true causal rela¬ 
tions X ^ Y ^ Z, there are significant correlations between 
all pairs and the clustering coefficient of the non-causal net¬ 
work would be maximal. In this simple example an ‘attack’ 
on node Y in the center certainly disrupts the causal network 
most because it also destroys the interlink between X and Z. 
But this is not taken into account if the non-causal network is 


analyzed. In recent years some studies in neuroscience have 
also applied linear Granger causality methods Il78ll73 and bi¬ 
variate transfer entropy has been applied to climate time series 

0 . 

With the measures ITX / MITP and IIX / Mil, one can 
make an attempt to put the notion of shortest paths in an 
information-theoretic perspective. Instead of counting short¬ 
est paths between X and Y, ITX or MITP give an appropriate 
measure of how much information is actually transferred. The 
interaction informations IIX or Mil can then be seen as an al¬ 
ternative to betweenness centrality ll75l l80l originally defined 
as 


S(t)= ^ 53®, (44) 

where Ugp is the total number of shortest paths from node i 
to node j and nsp{k) is the number of those paths that pass 
through k. In analogy, one can define an aggregated IIX node 
measure, causal interaction betweenness (CIB), as 

= ^ ^ ( 45 ) 

' (ij.T)GCfc 


where Ck is the set of interactions between all non-identical 
pairs of processes {i,j) at all lags 0 < r < Tmax where k ^ 
i, j is an intermediate process (at any lags) and \Ck \ denotes its 
cardinality. Here we take the absolute value |J(.(t)|, but 
one could further distinguish between mediating (positive in¬ 
teraction information) and counteracting (negative interaction 
information) effects. A linear application of such an approach 
is discussed in Ref. m. Instead of IIX, also Mil can be used 
to exclude further biasing confounders at the price of a much 
higher estimation dimension. Note that |j^('r)| does not 

n (k) 

denote a fraction like ’ and a more analogous measure 
to betweenness centrality would be obtained by normalizing 
each summand in Eq. (|45]l by the corresponding ITX or MITP, 




1 

W\ 


E 

(i,j,r)GCk 




(46) 


which is, however, not robust to outliers for small ITX. 


VII. APPLICATION TO CLIMATOLOGICAL TIME 
SERIES 

To illustrate the causal pathway analysis also on real data, 
we analyze a climatological dataset of daily mean sea level 
pressure anomalies (time series with the seasonal cycle re¬ 
moved) in the winter months (November to April) of 1997- 
2003 ISD at four locations in Eastern Europe indicated as A, 
B, C, D on the map in Eig. |^d) which was also analyzed 
in II 20 I . Eigure l^a) depicts the time series. We find that 
our novel approach of determining not only the information 
transfer between two processes as in previous work, but also 
quantifying the exact causal information pathway is especially 
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Figure 6. (Color online) Analysis of daily time series of mean sea level pressure with T = 1268 days. The algorithm to estimate the parents 
and neighbors was run as in Ref. 1201 using a threshold 1* — 0.015 nats, Tmax = 4 days and the CMI nearest-neighbor parameter k — 100 
(larger k have smaller variance which is important for independence tests), (a) Anomaly time series (days in winter months November to April 
only) of the four variables, all units are in hectopascal (hPa) relative to the seasonal mean, (b) Lag functions of MI and multivariate MIT, here a 
parameter A: = 10 was chosen to reduce the bias. Also contemporaneous MITs as defined in Ref. BSl are shown. All (C)MI values have been 
rescaled to the (partial) correlation scale via 1 —> \/l — £ [0,1] (4). The solid lines denote the fixed threshold 7* = 0.015 (rescaled), 

which is used to define the time series graph for the path-analysis, (c) shows the time series graph with the edge color denoting the rescaled 
MIT strength. Note the different order of the variables to better visualize the causal paths. Repetitions of links emanating from times further 
than f — 4 in the past are omitted, (d) Aggregated visualization as process graph (labels denote the lags and edge and node colors correspond 
to cross-MIT and auto-MIT, respectively, at the lag with maximum value). 


helpful here and reveals the circular dynamics of the atmo¬ 
spheric processes in this region. 

The reconstruction of the causal links with the PC- 
algorithm was discussed in Ref ll20ll . here we use it in a 
two-step approach. First, we estimate the preliminary par¬ 
ents and neighbors of all four variables with the causal algo¬ 
rithm as in Ref EOll using a fixed significance threshold I* = 
0.015 nats. These are Va = Bt-i}, Na = {Ct}, 

Vb = Ub = {Dt}, Vc = {Ct-i, A-i}, 

Nc = {4t}, Vd = and Ai) = {Bt}. Secondly, we 

use these parents and neighbors to estimate MIT values for all 


links which are plotted in Fig. |^b) next to MI. Also contem¬ 
poraneous MIT values using also neighbors as a condition as 
defined in Ref. BSl are shown. MIT values above the same 
fixed significance threshold I* = 0.015 nats are now con¬ 
sidered as the causal links (directed and contemporaneous for 
T = 0) defining the time series graph shown in Fig. |^c). We 
checked that contemporaneous links do not disappear if the 
contemporaneous neighbors are excluded from the condition 
in MIT (corresponding to dashed links in Def |^. From this 
graph one can now read off the parents V and neighbors Af 
used in the path-based information transfer measures. This 
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Causal path 

ITX 

IIX 

MITP 

Mil 

Dt-2 At 

via Bt-i 

0.09 ±0.06 

0.00 ±0.06 

0.15 ±0.02 

0.14 ±0.02 

Dt—2 —^ • — y Ct 
via Dt-i 
via Ct-i 

0.26 ±0.02 

0.22 ±0.02 
0.16 ±0.02 

0.24 ±0.02 

0.23 ±0.02 
0.18 ±0.02 

Dt-3 ^ Ct 

via (Dt- 2 , Dt-i) 
via Bt -2 
via At-i 
via (Ct- 2 , Ct-i) 

0.25 ±0.02 

0.22 ±0.02 
0.15 ±0.02 
0.09 ±0.02 
0.22 ±0.02 

0.22 ±0.02 

0.21 ±0.02 
0.13 ±0.02 
0.12 ±0.02 
0.20 ±0.02 


Table II. Measures of information transfer along selected causal paths for the climatological example of Fig.|^ All (C)MI values have been 
rescaled to the (partial) correlation scale via I —>■ v/l — G [0,1] (4|. The estimation parameter fc = 10 was chosen as a compromise 
between low bias and not too high variance, the 68% confidence interval is based on a bootstrap with 1000 samples. 


graph also helps to understand why MI has strongly signif¬ 
icant values in Fig. |^b) where MIT is zero. For example, 
the MI values in panel C ^ D can well be explained by 
past values of D, e.g., Dt-2 acting as a common driver via 
Dt ^ Dt-i -fr- Dt_2 — f Ct-i- 


In the following, we conduct a causal path analysis for the 
influence of Z? on A and C at different lags. There are sig¬ 
nificant ITX values at two and three days lag. From the time 
series graph (Fig.j^c)) we can read off the causal paths con¬ 
tributing to the ITX values. In Tab. [II| we list the results of 
an analysis for three causal path interactions. The interaction 
Dt -2 • ■ ■ —f has only one causal path via Bt-i, but 

also contemporaneous sidepaths Dt -2 —Bt -2 —> • • • — At- 
Here ITX and IIX gave very noisy results (large confidence 
bounds). MITP, on the other hand, is larger than ITX (as ex¬ 
pected from Theorem [T]) with a much smaller confidence in¬ 
terval. Here Mil via Bt-i explains all of the MITP within er¬ 
ror bounds as expected since it is the only intermediate node 
and no direct link exists. Next, we turn to the more inter¬ 
esting influence of D on C. At a lag of two days MITP is 


slightly smaller than ITX, which, as discussed in Sect. V A 


is due to finite sample bias. The indirectness of the inter¬ 
action Dt -2 Ct here stems from the two paths 

Dt -2 —f Dt-i —>■ Ct and Dt -2 —f C't-i —f Ct via autode¬ 
pendencies (Fig.j^c)). The interaction analyses with IIX and 
Mil here both indicate that a slightly larger part of the ITX 
is mediated via Dt-i rather than Ct-i (Tab. [n| in line with 
the higher auto-MIT strength of the autodependency within 
D. At a lag of three days the interaction Dts —>■ ■ ■ ■ —>■ Ct 
has many more paths not only via autodependencies, but also 
via Bt -2 and At-i (and also non-causal contemporaneous 
sidepaths). While also here the auto-dependencies together 
with the direct link Dt-i Ct strongly contribute to ITX 
(Tab. [n)i, the path Dt -3 Bt -2 ^t-i Ct seems to 
be relevant, too, as indicated by the significant IIX and Mil 
values through these nodes. 


This causal picture of a counter-clockwise ‘flow of en¬ 
tropy’ is consistent with the dynamical processes governing 
the lower and middle atmosphere circulation in the considered 
area. One usually observes a superposition of westerly winds 
with traveling extratropical counter-clockwise cyclones that 
traverse the area and whose trajectories are regulated by the 


aforementioned westerlies 18^ . Consistent with the causal 
lags of one or two days, these processes act on short daily 
time scales. Note that the variables were defined in an ad-hoc 
manner by the locations of grid points here, but one can better 
isolate subprocesses of complex systems by a suitable dimen¬ 
sion reduction, see ifTOl [8^ for an application to the global 
atmospheric pressure system. 


VIII. CONCLUSIONS 

This work expanded the approach introduced in Ref. Il48ll 
which considered information-theoretic measures to quantify 
the strength of links in causal time series graphs. Here the goal 
was to quantify indirect causal interactions and how much 
intermediate processes mediate or counteract an interaction. 
Our approach is more focused on a detailed picture of an in¬ 
teraction mechanism between two variables and complements 
concepts aimed at decomposing predictive information about 
a target variable Y. 

The two considered pairs of measures ITX / IIX and MITP 
/ Mil for a causal interaction Xt-r Yt have in com¬ 

mon the idea to extract information originating in process X 
only at the lagged time t — t and are conditioned in order to 
measure only information transfer along causal paths. MITP 
further attempts to exclude the influence of other drivers of 
Y or intermediate path nodes by conditioning out the parents 
of all processes involved in the causal interaction. As a fur¬ 
ther step, IIX and Mil quantify the mediating or counteract¬ 
ing effect of intermediate processes on causal paths to an in¬ 
teraction mechanism to determine the relative importance of 
pathways of causal information transfer. In extensions of the 
coupling strength autonomy theorem ll48l . for certain model 
classes MITP and Mil allow to entirely isolate the quantifica¬ 
tion of the interaction mechanism from other driving mecha¬ 
nisms. Then the values of MITP and Mil can be solely related 
to the coefficients belonging to the indirect interaction mech¬ 
anism between X and Y making them well interpretable not 
only information-theoretically, but also relating their value to 
the underlying dynamics. 

Generally, however, the value of MIT or MITP remains 
hard to interpret for nonlinearly intertwined complex sys- 
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terns, but their information-theoretic definition and founda¬ 
tion based on the Markov structure of the process allows to 
quantify a rigorous notion of causal information transfer as an 
abstraction of the dynamics. The novel measures can also be 
helpful in understanding dynamical interactions in toy models 
from nonlinear dynamics. While the absolute values of ITX 
and MITR, measured in nats, cannot be simply related to units 
of the variables like linear measures, the values of the interac¬ 
tion measures IIX and Mil can be used to quantify how much 
of the information transfer can be attributed to individual in¬ 
termediate processes. The goal of information-theoretic mea¬ 
sures is not a complete understanding of the dynamics of the 
system which can only be achieved by experiments or detailed 
modeling. Then causal effect quantifiers such as proposed in 
Pearl ifT^ or ll22l are good starting points. 

The climatological analysis underlines the importance of 
inferring mechanism delays and pathways for physical inter¬ 
pretations and serves as a first step to study more complex 
systems in climate and beyond. More exploratory studies in 
the spirit of functional network analysis, but with a rigorous 
definition of information transfer, can be based on the aggre¬ 


gate measures introduced in Sect. VIC A linear application 
of such an approach is demonstrated in Ref. Qo). As a further 
outlook, it will be an interesting avenue of research to connect 
the time series graph-based framework of information transfer 
to recent concepts of synergistic information sharing 16711681 . 
In Ref. Il6^ synergistic effects are studied with respect to op¬ 
timal prediction schemes. 


it generally holds that I{X]V \ Vxt^J'x, T^Wx)) = 0- 
Firstly, all paths arriving at X from the past are surely blocked 
(see Sect. IIIB[ ) by Vx because they contain the motifs —> 
■ —T^Xor—■—which are both blocked. Further, also 
contemporaneous sidepaths are blocked by (A/"^, 'P{Nx)) 
and there are also no directed causal paths from X to any 
node in V since, by definition, such a node would belong to 
Cx^Y- We now apply the chain rule on the (multivariate) 

CMlI{X-{Y,f )\Vx,Ml, 7^(A/I)) twice: 

I{X-{Y,V)\VxMl.V{Ml)) 

= I{X;Y\VxMl,n^l)) 


+ I{X-V\Vx:K,V{Nf),Y) 


(Al) 


>0 


= I{X-,V\Vx.Nl,V{Ul)) 


=0 


+ I{X-Y\V,Vx,Mx.'P{MI)) (A2) 

I{X-Y\V,VxMx.'P{NI)) 

= I{X-Y\Vy\Cx^y.V{Cx^y),MI.V{NI)) 

Y 


>I{X-Y\VxMx,V{N^)). 


2. Further information-theoretic properties 


(A3) 
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Appendix A: Proofs of theorems 
1. Proof of Inequality Theoremj^ 

The Inequality Theorem [T] can be proven similarly to the 
inequalities among ITY and MIT in Ref. 1481 . To simplify 
notation, we drop the time indices and write X for Xt-r, X 
for Yt, A/I for A/I(_ , and Cx^y for Cxt_,^Yt ■ 

Proof. We define V to be the set of parents of both Y and 
the path nodes Cx^y (including X) that is not already in¬ 
cluded in the conditions of ITX {Vx,JXx, /^(A/I)), i.e., 
r = {Vy\Cx^y,V{Cx^y)) \ {VxMx, ViNY)- Then 


Some further fundamental properties of information- 
theoretic quantities are important for the coupling strength au¬ 
tonomy theorems. The data processing inequality m states 
that 

/(X;/(Y)|Z)</(X;y|X), (A4) 

i.e., manipulating Y (which can also be a vector) by some 
function / can only reduce the shared information. Note, 
however, that equality holds for smooth uniquely invertible 
transformations such as linear rescalings of X,Y or Z under 
which CMI is invariant ll5^ . For random variables Y and W 
and an arbitrary function / we have that 

H{Y + f{W)\W) = [ p{w)H{Y + f{W)\W = w)dw 

p{w)H{Y\W = w)dw 

= H{Y\W), (A5) 

because f{W) for W = w is a fixed constant and entropies 
are translationally invariant. In particular, H{f{W)\W) = 0. 
This property also holds for the joint entropy and with another 
arbitrary function g it follows for CMI that 

I{X + g{Z)-Y + f{W)\Z, W) = J(X; y|X, W). (A6) 

Also here, I{X-, f{W)\W) = 0. Last, conditions that are 
conditionally independent of the joint vector (X, Y) given Z 
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can be dropped: 

I{{X,Y)-W\Z)=Q 

I{X-Y\W,Z) = I{X- Y\Z), (A7) 

which can be derived from the fundamental decomposition 
and weak union properties of conditional independence rela¬ 
tions. This relation also holds without the condition on Z. 


3. Proof for momentary information transfer along paths 


Also here, to simplify notation we drop the time indices 
and write X for Xt-r, Y for Yt, Nx for Cx-j-y 

for Cxt_^^Yf In the theorem, we denoted those parents of 
Y that are in the path nodes Cx->-y defined in Eq. © as 
Vy = 'Py n Cx^Y and correspondingly for other path nodes 
Vf indexed by i. Also note that X is included in the set of 
path nodes. 

Proof. We insert the dependencies assumed for X and Y in 
Eq. p5| ) in the definition of MITP (Eq. (fTS])): 

rMITP 

^X^Y 

= I{X;Y\ Vy\Cx^y,V{Cx^y)MI, V^x)) 

(A8) 

I{gx{Vx) + V^-Jy{V^) + 9y{Vy \ V^) + 77^ 
\Vy\Cx^y,V{Cx^y)Mx,'P{^1)) (A9) 

I Vy\Cx^y,V{Cx^y)MI, V{f^x))- (AlO) 


In the theorem, fy is assumed linear and we also assumed 
all other path nodes g Cx^y to linearly depend on each 


other by Eq. (361, where dependencies on external nodes were 
only assumed additive. Then, 

I 'PY\Cx^Y,'P{Cx^Y),fJ'x , '^(A/J)), 

(All) 


for some linear function / yielding Eq. (371. 

Now under the “no contemporaneous dependency”- 
condition (341 it holds that A/^ = 0 and further 


/ 77^, U,77*) ; Vy\Cx^y, V{Cx^y)) = 0 , (A12) 


which can be derived graph-theoretically exploiting Markov 
properties as follows: Eirstly, since the noise terms 
(77^,77^,0^77*) of the path nodes in Cx^y and Y are 


i.i.d., they are independent of all those processes in 
{Vy\Cx^y:'P(Cx^y)) with paths ending with a directed 
arrow at any of the path nodes Cx^y or Y- Secondly, 
by definition of Cx^y there are no directed paths from 
any node in Cx^y toward {Vy\Cx^y,P{Cx^y))- Last, 
contemporaneous sidepaths from any node in Cx^y to 
[VyXCx^yiP(Cx^y)) are excluded by the “no contempo¬ 
raneous dependency”-condition ( 34|. 

Eurther, from Eq. ( A121i we find that 
l{{v^J{g^,^^V^) + fi^)-.VY\Cx^Y.V((Zx^Y)) = 0 
due to the data processing inequal ity (|A4[ ) and therefore we 
can drop the conditions due to Eq. ( |A7^ , 

hpS + ,P), (A13) 


yielding Eq. p8| ). Note that since the dynamical noise is 
i.i.d. and 0 < < r, it holds that (77^,77^) _LL 77* V z and 

77^ _LL 77^. □ 


This proof also includes the proof for the MIT coupling 
strength autonomy theorem as a special case, but in a much 
shorter form than in Ref. Il48ll : If Cxt-^^Yt = {Xt-r}, and 
under the “no sidepath”-constraint in Ref. Il48ll . the conditions 
on the neighbors can be dropped and MITP collapses to MIT. 
Since then also/(777^.,., Ui77f_^.) = /(77^^), Eq. (38 1 reduces 
to the same form as in Ref. 


4. Proof for momentary interaction information 


Using the same assumptions as for Theorem]^ the depen¬ 
dencies of momentary interaction information between X, 
Y and intermediate processes W = ...) G 

Cxt_^^Yt \ {^t-r} indexed by j can be simplified exploiting 
the same arguments as above. 

Proof. 

-7-MII 

= Z(A;y;W I Vy\Cx,y,V{Cx^y),J^I, VWD) 

(A14) 

EqS2:(,y^;/(r7^,Ui77*) -f 77 ^; { 77 ^ -f /,(77^, Ui^j77*)}^. 

I VY\Cx^Y,V{Cx^Y),ffx. Pi^x)) (A15) 

'^‘'■=^2:(77^;/(77^,Ui77*) -f 77 ^; { 77 ^ -f /j(77^, ^^^ 77 *)}^ , 

(A16) 


where the last step is valid only under the “no contempo¬ 
raneous dependency”-condition Eq. (34i giving Eq. (40i with 
linear functions /, fj. □ 
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